-
Couldn't load subscription status.
- Fork 285
IGNOREME: Iscp integration #22519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
IGNOREME: Iscp integration #22519
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #21835
What this PR does / why we need it:
index update with ISCP
PR Type
Enhancement, Feature, Tests, Bug fix
Description
• Major Feature: Implements ISCP (Index Synchronization Change Processing) with CDC (Change Data Capture) support for vector indexes
• HNSW Generic Types: Refactors HNSW vector index implementation to use generic types with
types.RealNumbersconstraint• Async Index Support: Adds ASYNC keyword support for fulltext and vector indexes with asynchronous processing capabilities
• CDC Infrastructure: Implements comprehensive CDC synchronization for HNSW, IVF-flat, and fulltext indexes with SQL generation
• Index Consumer: Adds IndexConsumer for processing index synchronization data in both snapshot and tail modes
• DDL Integration: Integrates ISCP job management into DDL operations (CREATE, DROP, ALTER TABLE) with CDC task lifecycle
• SQL Writer Framework: Implements IndexSqlWriter interface with algorithm-specific implementations for different index types
• Test Coverage: Adds comprehensive test suites for all new CDC, ISCP, and async functionality
• Bug Fixes: Fixes null handling in watermark updater and fulltext index parameter processing
• Enhanced Error Messages: Improves vector dimension mismatch error messages for better clarity
Diagram Walkthrough
File Walkthrough
17 files
index_sqlwriter.go
Add index SQL writer implementations for vector indexespkg/iscp/index_sqlwriter.go
• Implements IndexSqlWriter interface with three concrete
implementations: FulltextSqlWriter, IvfflatSqlWriter, and
HnswSqlWriter
• Provides SQL generation for different vector index
algorithms (fulltext, IVFFLAT, HNSW) with CDC operations (insert,
upsert, delete)
• Includes factory function
NewIndexSqlWritertocreate appropriate writer based on algorithm type
• Implements generic
type support for HNSW with
HnswSqlWriter[T types.RealNumbers]sync.go
Add HNSW vector index CDC synchronization supportpkg/vectorindex/hnsw/sync.go
• Implements CDC synchronization functionality for HNSW vector indexes
with
CdcSyncfunction• Provides
HnswSyncstruct for managing indexupdates, insertions, and deletions
• Includes parallel processing
support for bulk operations and sequential updates
• Generates SQL
statements for metadata and index table updates
model.go
Add HNSW vector index model with generic type supportpkg/vectorindex/hnsw/model.go
• Implements
HnswModelstruct for HNSW vector index operations withgeneric type support
• Provides methods for index building, loading,
saving, searching, and CDC operations
• Includes file-based
persistence with chunked loading from database
• Supports concurrent
operations with atomic counters and proper resource management
index_consumer.go
Implement index consumer for CDC data processingpkg/iscp/index_consumer.go
• Implemented
IndexConsumerfor processing index synchronization data• Handles both snapshot and tail (CDC) data processing modes
• Manages
SQL execution through channels and transaction handling
• Supports
different index algorithms (HNSW, IVF-flat, fulltext)
ddl.go
Integrate ISCP CDC tasks into DDL operationspkg/sql/compile/ddl.go
• Integrated ISCP job management into DDL operations
• Added CDC task
creation/deletion for index operations
• Updated table operations
(create, drop, truncate, alter) to handle index CDC tasks
• Added
support for async index updates with PITR integration
cdc_util.go
Add CDC utilities for index synchronization managementpkg/sql/compile/cdc_util.go
• Added utility functions for managing CDC tasks and PITR for indexes
• Implements job registration/unregistration with ISCP system
•
Handles creation and deletion of index-specific CDC tasks
• Manages
PITR lifecycle for index synchronization
types.go
Add CDC data structures and async parameter supportpkg/vectorindex/types.go
• Added CDC-related data structures and constants
• Implemented
VectorIndexCdcandVectorIndexCdcEntrywith generic types• Added CDC
operation types (INSERT, UPSERT, DELETE)
• Enhanced parameter
structures with async support
func_hnsw.go
Add HNSW CDC update function implementationpkg/sql/plan/function/func_hnsw.go
• New function
hnswCdcUpdatefor handling HNSW CDC (Change DataCapture) updates
• Processes database, table, type, dimension, and CDC
JSON parameters
• Calls
hnsw.CdcSyncfor float32 vector types withlogging
ddl_index_algo.go
Implement async index support with CDC task creationpkg/sql/compile/ddl_index_algo.go
• Added async index support for fulltext indexes with CDC task
creation
• Enhanced HNSW index handling to register CDC update tasks
•
Added logging import and async parameter checking
sqlexec.go
Add transaction execution utility functionpkg/vectorindex/sqlexec/sqlexec.go
• Added new
RunTxnfunction for executing transactions with propercontext setup
• Handles account ID extraction and SQL executor
configuration
• Provides transaction execution with proper options and
error handling
create.go
Add async option support to index creation syntaxpkg/sql/parsers/tree/create.go
• Added
Asyncboolean field toIndexOptionstruct• Enhanced
Formatmethod to output "ASYNC " when async flag is true
list_builtIn.go
Register HNSW CDC update built-in functionpkg/sql/plan/function/list_builtIn.go
• Added new
HNSW_CDC_UPDATEfunction definition with proper overload•
Function accepts 5 parameters including database, table, type,
dimension, and CDC data
• Returns uint64 type and uses
hnswCdcUpdateas execution logic
function_id.go
Register HNSW CDC update function identifierpkg/sql/plan/function/function_id.go
• Added
HNSW_CDC_UPDATEfunction ID constant (349)• Updated
FUNCTION_END_NUMBERto 350• Registered "hnsw_cdc_update" function
name mapping
keywords.go
Add ASYNC keyword to MySQL parserpkg/sql/parsers/dialect/mysql/keywords.go
• Added "async" keyword mapping to
ASYNCtoken• Registered new
keyword in the MySQL dialect parser
consumer.go
Add index sync consumer type supportpkg/iscp/consumer.go
• Added support for
ConsumerType_IndexSyncconsumer type• Returns
NewIndexConsumerfor index synchronization operationstypes.go
Add async parameter to fulltext parser configurationpkg/fulltext/types.go
• Added
Asyncfield toFullTextParserParamstruct• Enhanced fulltext
parser parameters to support async operations
mysql_sql.y
Add ASYNC syntax support to MySQL grammarpkg/sql/parsers/dialect/mysql/mysql_sql.y
• Added
ASYNCtoken definition and grammar rules• Enhanced index
option parsing to handle async flag
• Added async keyword to
non-reserved keywords list
19 files
util.go
Enable comprehensive data type support in ISCP utilitiespkg/iscp/util.go
• Uncomments and enables support for additional data types in
extractRowFromVectorandconvertColIntoSqlfunctions• Adds support
for JSON, bit, array types, date/time types, decimal types, UUID, and
other specialized types
• Includes
appendHexfunction for binary dataformatting
• Improves NULL value handling with proper type casting
alter.go
Add ISCP job management for ALTER TABLE operationspkg/sql/compile/alter.go
• Adds ISCP job cleanup during ALTER TABLE operations
• Includes
DropAllIndexCdcTaskscall to remove CDC tasks for temporary tables•
Adds fulltext index handling in the reindex process
• Improves error
handling and logging for ALTER TABLE copy operations
search.go
Refactor HNSW search with generic types and model abstractionpkg/vectorindex/hnsw/search.go
• Refactored
HnswSearchto use generic types withtypes.RealNumbersconstraint
• Replaced
HnswSearchIndexwithHnswModel[T]for bettertype safety
• Simplified search implementation by removing file
loading logic
• Updated metadata loading to use generic
LoadMetadatafunction
build_dml_util.go
Add async index support to DML operationspkg/sql/plan/build_dml_util.go
• Added async index support to skip synchronous index operations
•
Updated multi-table index handling to check for async configuration
•
Modified fulltext and IVF index processing to respect async settings
•
Enhanced
MultiTableIndexstructure withIndexAlgoParamsfieldfulltext.go
Enhance fulltext tokenization with composite key supportpkg/sql/plan/fulltext.go
• Enhanced fulltext index tokenization to support both table and
values scans
• Added support for composite primary keys in fulltext
operations
• Improved parameter handling for different scan types
•
Added primary key type extraction for values-based operations
secondary_index_utils.go
Add async parameter support to index configurationpkg/catalog/secondary_index_utils.go
• Added async parameter support to index configuration
• Implemented
IsIndexAsyncfunction to check async settings• Enhanced parameter
parsing to handle async flag
• Updated fulltext and vector index
parameter handling
build_show_util.go
Add async parameter support in CREATE TABLE SQL constructionpkg/sql/plan/build_show_util.go
• Enhanced
ConstructCreateTableSQLto handleasyncparameter fromindex algo params
• Added JSON parsing for async flag and appends
"ASYNC" to index string when true
• Improved error handling for JSON
parsing operations
func_cast.go
Enhance array casting with dimension validationpkg/sql/plan/function/func_cast.go
• Enhanced
strToArrayfunction with dimension validation• Added
bypass for max dimension check and proper error handling
• Improved
array conversion with dimension mismatch detection
hnsw_create.go
Update HNSW creation to use generic typespkg/sql/colexec/table_function/hnsw_create.go
• Updated
hnswCreateStateto use genericHnswBuild[float32]type•
Modified
NewHnswBuildcall to use generic float32 type parameterhnsw.go
Relax HNSW query builder node type constraintspkg/sql/plan/hnsw.go
• Commented out TABLE_SCAN node type validation
• Removed strict node
type checking for HNSW query building
mock_consumer.go
Use system account constant in mock consumerpkg/iscp/mock_consumer.go
• Updated context creation to use
catalog.System_Accountinstead ofhardcoded uint32(0)
• Improved system account constant usage for
consistency
types.go
Add vector array type description formattingpkg/container/types/types.go
• Added
DescStringmethod cases forT_array_float32andT_array_float64• Returns formatted vector type descriptions like
"VECF32(128)" and "VECF64(128)"
hnsw_search.go
Update HNSW search to use generic typespkg/sql/colexec/table_function/hnsw_search.go
• Updated
newHnswAlgoFnto use genericNewHnswSearch[float32]call•
Modified HNSW search algorithm instantiation with float32 type
parameter
data_retriever.go
Add account and table ID getters to data retrieverpkg/iscp/data_retriever.go
• Added
GetAccountID()andGetTableID()methods toDataRetrieverImpl•
Provides access to account and table identifiers for data retrieval
operations
types.go
Add algorithm parameters to multi-table index structurepkg/sql/plan/types.go
• Added
IndexAlgoParamsfield toMultiTableIndexstruct• Enhanced
multi-table index structure to store algorithm parameters
types.go
Extend DataRetriever interface with ID getterspkg/iscp/types.go
• Added
GetAccountID()andGetTableID()methods toDataRetrieverinterface
• Extended interface to provide access to account and table
identifiers
vector_hnsw.result
Update vector dimension error message formattest/distributed/cases/vector/vector_hnsw.result
• Updated error message format for vector dimension mismatch
• Changed
from "vector ops between different dimensions" to "expected vector
dimension X != actual dimension Y"
vector_index.result
Improve vector index dimension error messagestest/distributed/cases/vector/vector_index.result
• Updated error message format for vector dimension validation
•
Improved error message clarity for dimension mismatch scenarios
array.result
Update array dimension error message formattest/distributed/cases/array/array.result
• Updated vector dimension error messages to use clearer format
•
Changed error text to "expected vector dimension X != actual dimension
Y"
15 files
index_consumer_test.go
Add comprehensive test suite for index consumerpkg/iscp/index_consumer_test.go
• Adds comprehensive test suite for index consumer functionality
•
Includes mock implementations for data retrieval, SQL execution, and
error handling
• Tests both snapshot and tail data processing
scenarios for HNSW indexes
• Validates SQL generation and CDC
operation handling
sync_test.go
Add comprehensive HNSW CDC synchronization test suitepkg/vectorindex/hnsw/sync_test.go
• Added comprehensive test suite for HNSW CDC synchronization
functionality
• Tests cover various scenarios: empty sync, upsert,
delete, mixed operations, and multi-file handling
• Includes mock
functions for SQL execution and streaming operations
• Tests handle
shuffled data and large datasets (up to 1M entries)
index_sqlwriter_test.go
Add comprehensive index SQL writer test suitepkg/iscp/index_sqlwriter_test.go
• Added test suite for index SQL writers (fulltext, HNSW, IVF-flat)
•
Tests cover insert, upsert, delete operations for different index
types
• Includes tests for composite primary keys and various data
types
• Validates SQL generation for different vector index algorithms
util_test.go
Add utility tests for data type conversion and SQL generationpkg/iscp/util_test.go
• Added utility tests for data type conversion and SQL generation
•
Tests various data types including JSON, arrays, dates, decimals,
UUIDs
• Validates proper SQL formatting for different vector and
scalar types
• Includes comprehensive type conversion validation
search_test.go
Enhance HNSW search tests with multi-file supportpkg/vectorindex/hnsw/search_test.go
• Enhanced existing search tests with multi-file support
• Added mock
functions for 2-file scenarios and catalog operations
• Extended test
coverage for metadata and index batch creation
• Added utility
functions for creating test batches with different file configurations
model_test.go
Add comprehensive HNSW model test suitepkg/vectorindex/hnsw/model_test.go
• Added comprehensive test suite for HNSW model operations
• Tests
model loading, searching, adding/removing vectors, and SQL generation
• Includes error handling tests for nil model scenarios
• Validates
model state management and persistence operations
func_hnsw_test.go
Add HNSW CDC update function test suitepkg/sql/plan/function/func_hnsw_test.go
• Added test suite for HNSW CDC update function
• Tests various error
conditions and parameter validation
• Validates null parameter
handling and JSON parsing
• Ensures proper error handling for invalid
inputs
build_test.go
Update HNSW build tests for generic type systempkg/vectorindex/hnsw/build_test.go
• Updated build tests to use generic HNSW types
• Modified test
functions to work with
HnswModel[float32]instead ofHnswSearchIndex•
Updated constructor calls to use generic type parameters
• Maintained
existing test functionality with new type system
mysql_sql_test.go
Add ASYNC keyword test cases for index creationpkg/sql/parsers/dialect/mysql/mysql_sql_test.go
• Added test cases for
ASYNCkeyword in fulltext and vector indexcreation statements
• Updated expected output to include
ASYNCkeywordin uppercase format
types_test.go
Add vector index CDC operations test suitepkg/vectorindex/types_test.go
• New test file for vector index CDC operations
• Tests Insert,
Delete, Upsert operations and JSON serialization
• Validates CDC state
management and JSON output format
vector_ivf_async.result
IVF async vector index test resultstest/distributed/cases/vector/vector_ivf_async.result
• Test results for IVF vector index with ASYNC keyword functionality
•
Validates async index creation, data insertion, and vector similarity
queries
• Demonstrates proper async index behavior with sleep delays
vector_ivf_async.sql
IVF async vector index test casestest/distributed/cases/vector/vector_ivf_async.sql
• Test cases for IVF vector indexes with ASYNC keyword
• Tests index
creation, data loading, and vector similarity searches
• Includes
sleep statements to allow async operations to complete
vector_hnsw_async.result
HNSW async vector index test resultstest/distributed/cases/vector/vector_hnsw_async.result
• Test results for HNSW vector index with ASYNC functionality
•
Validates async HNSW index creation and vector operations
• Shows
proper handling of CDC updates and similarity searches
fulltext_async.sql
Fulltext async index test casestest/distributed/cases/fulltext/fulltext_async.sql
• Test cases for fulltext index with ASYNC keyword support
• Tests
async fulltext index creation and search functionality
• Includes
multilingual content and null value handling
fulltext_async.result
Fulltext async index test resultstest/distributed/cases/fulltext/fulltext_async.result
• Test results for async fulltext index functionality
• Validates
async fulltext search operations and result accuracy
• Shows proper
handling of multilingual and null content
1 files
build.go
Refactor HNSW build to use generic types and shared modelpkg/vectorindex/hnsw/build.go
• Refactors HNSW build functionality to use generic types with
HnswBuild[T types.RealNumbers]• Removes
HnswBuildIndexstruct andrelated methods (moved to model.go)
• Updates function signatures and
type definitions to support generic vector types
• Maintains
multi-threaded building capabilities with channel-based communication
1 files
function_id_test.go
Add HNSW CDC update function IDpkg/sql/plan/function/function_id_test.go
• Adds
HNSW_CDC_UPDATEfunction ID (349) to predefined function IDsmap
• Updates
FUNCTION_END_NUMBERfrom 349 to 350 to accommodate newfunction
3 files
watermark_updater.go
Fix ISCP watermark updater null handling bugspkg/iscp/watermark_updater.go
• Fixed bug in
unregisterJobsByDBNameto handle empty tableIDs array•
Corrected null check index in
queryIndexLogfunction• Added
conditional execution to prevent SQL errors
iteration.go
Improve ISCP iteration error handling and context setuppkg/iscp/iteration.go
• Added error handling for
CollectChangesfunction call• Enhanced
consumer execution with proper tenant context setup
• Fixed context
propagation for system account operations
build_ddl.go
Fix fulltext index algorithm parameters processingpkg/sql/plan/build_ddl.go
• Fixed fulltext index table building to always process
IndexAlgoParams• Removed conditional check that prevented parameter
processing
3 files